Hash-based systems and methods for detecting and preventing transmission of unwanted e-mail

ABSTRACT

A system ( 120 ) detects transmission of potentially unwanted e-mail messages. The system ( 120 ) may receive e-mail messages and generate hash values based on one or more portions of the e-mail messages. The system ( 120 ) may then determine whether the generated hash values match hash values associated with prior e-mail messages. The system ( 120 ) may determine that one of the e-mail messages is a potentially unwanted e-mail message when one or more of the generated hash values associated with the e-mail message match one or more of the hash values associated with the prior e-mail messages.

RELATED APPLICATIONS

[0001] This application claims priority under 35 U.S.C. § 119 based onU.S. Provisional Application No. 60/407,975, filed Sep. 5, 2002, thedisclosure of which is incorporated herein by reference. Thisapplication is also a continuation-in-part of U.S. patent applicationSer. No. 10/251,403, filed Sep. 20, 2002, which claims priority under 35U.S.C. § 119 based on U.S. Provisional Application No. 60/341,462, filedDec. 14, 2001, both of which are incorporated herein by reference. Thisapplication is also a continuation-in-part of U.S. patent applicationSer. No. 09/881,145, and U.S. patent application Ser. No. 09/881,074,both of which were filed on Jun. 14, 2001, and both of which claimpriority under 35 U.S.C. § 119 based on U.S. Provisional Application No.60/212,425, filed Jun. 19, 2000, all of which are incorporated herein byreference.

BACKGROUND OF THE INVENTION

[0002] 1. Field of the Invention

[0003] The present invention relates generally to network security and,more particularly, to systems and methods for detecting and/orpreventing the transmission of unwanted e-mails, such as e-mailscontaining worms and viruses, including polymorphic worms and viruses,and unsolicited commercial e-mails.

[0004] 2. Description of Related Art

[0005] Availability of low cost computers, high speed networkingproducts, and readily available network connections has helped fuel theproliferation of the Internet. This proliferation has caused theInternet to become an essential tool for both the business community andprivate individuals. Dependence on the Internet arises, in part, becausethe Internet makes it possible for multitudes of users to access vastamounts of information and perform remote transactions expeditiously andefficiently. Along with the rapid growth of the Internet have comeproblems arising from attacks from within the network and the shearvolume of commercial e-mail. As the size of the Internet continues togrow, so does the threat posed to users of the Internet.

[0006] Many of the problems take the form of e-mail. Viruses and wormsoften masquerade within e-mail messages for execution by unsuspectinge-mail recipients. Unsolicited commercial e-mail, or “spam,” is anotherburdensome type of e-mail because it wastes both the time and resourcesof the e-mail recipient.

[0007] Existing techniques for detecting viruses, worms, and spamexamine each e-mail message individually. In the case of viruses andworms, this typically means examining attachments for byte-strings foundin known viruses and worms (possibly after uncompressing or de-archivingattached files), or simulating execution of the attachment in a “safe”compartment and examining its behaviors. Similarly, existing spamfilters usually examine a single e-mail message looking for heuristictraits commonly found in unsolicited commercial e-mail, such as anabundance of Uniform Resource Locators (URLs), heavy use ofall-capital-letter words, use of colored text or large fonts, and thelike, and then “score” the message based on the number and types of suchtraits found. Both the anti-virus and the anti-spam techniques candemand significant processing of each message, adding to the resourceburden imposed by unwanted email. Neither technique makes use ofinformation collected from other recent messages.

[0008] Thus, there is need for an efficient technique that can quicklydetect viruses, worms, and spam in e-mail messages arriving at e-mailservers, possibly by using information contained in multiple recentmessages to detect unwanted mail more quickly and efficiently.

SUMMARY OF THE INVENTION

[0009] Systems and methods consistent with the present invention addressthis and other needs by providing a new defense that detects andprevents the transmission of unwanted (and potentially unwanted) e-mail,such as e-mails containing viruses, worms, and spam.

[0010] In accordance with an aspect of the invention as embodied andbroadly described herein, a method for detecting transmission ofpotentially unwanted e-mail messages is provided. The method includesreceiving e-mail messages and generating hash values based on one ormore portions of the e-mail messages. The method further includesdetermining whether the generated hash values match hash valuesassociated with prior e-mail messages. The method may also includedetermining that one of the e-mail messages is a potentially unwantede-mail message when one or more of the generated hash values associatedwith the e-mail message match one or more of the hash values associatedwith the prior e-mail messages.

[0011] In accordance with another aspect of the invention, a mail serverincludes one or more hash memories and a hash processor. The one or morehash memories is/are configured to store count values associated withhash values. The hash processor is configured to receive an e-mailmessage, hash one or more portions of the e-mail message to generatehash values, and increment the count values corresponding to thegenerated hash values. The hash processor is further configured todetermine whether the e-mail message is a potentially unwanted e-mailmessage based on the incremented count values.

[0012] In accordance with yet another aspect of the invention, a methodfor detecting transmission of unwanted e-mail messages is provided. Themethod includes receiving e-mail messages and detecting unwanted e-mailmessages of the received e-mail messages based on hashes of previouslyreceived e-mail messages, where multiple hashes are performed on each ofthe e-mail messages.

[0013] In accordance with a further aspect of the invention, a methodfor detecting transmission of potentially unwanted e-mail messages isprovided. The method includes receiving an e-mail message; generatinghash values over blocks of the e-mail message, where the blocks includeat least two of a main text portion, an attachment portion, and a headerportion of the e-mail message; determining whether the generated hashvalues match hash values associated with prior e-mail messages; anddetermining that the e-mail message is a potentially unwanted e-mailmessage when one or more of the generated hash values associated withthe email message match one or more of the hash values associated withthe prior e-mail messages.

[0014] In accordance with another aspect of the invention, a mail serverin a network of cooperating mail servers is provided. The mail serverincludes one or more hash memories and a hash processor. The one or morehash memories is/are configured to store information relating to hashvalues corresponding to previously-observed e-mails. The hash processoris configured to receive at least some of the hash values from anotherone or more of the cooperating mail servers and store informationrelating to the at least some of the hash values in at least one of theone or more hash memories. The hash processor is further configured toreceive an e-mail message, hash one or more portions of the receivede-mail message to generate hash values, determine whether the generatedhash values match the hash values corresponding to previously-observede-mails, and identify the received e-mail message as a potentiallyunwanted e-mail message when one or more of the generated hash valuesassociated with the received e-mail message match one or more of thehash values corresponding to previously-observed e-mails.

[0015] In accordance with yet another aspect of the invention, a mailserver is provided. The mail server includes one or more hash memoriesand a hash processor. The one or more hash memories is/are configured tostore count values associated with hash values. The hash processor isconfigured to receive e-mail messages, hash one or more portions of thereceived email messages to generate hash values, increment the countvalues corresponding to the generated hash values, as incremented countvalues, and generate suspicion scores for the received e-mail messagesbased on the incremented count values.

[0016] In accordance with a further aspect of the invention, a methodfor preventing transmission of unwanted e-mail messages is provided. Themethod includes receiving an e-mail message; generating hash values overportions of the e-mail message as the e-mail message is being received;and incrementally determining whether the generated hash values matchhash values associated with prior e-mail messages. The method furtherincludes generating a suspicion score for the e-mail message based onthe incremental determining; and rejecting the e-mail message when thesuspicion score of the e-mail message is above a threshold.

BRIEF DESCRIPTION OF THE DRAWINGS

[0017] The accompanying drawings, which are incorporated in andconstitute a part of this specification, illustrate the invention and,together with the description, explain the invention. In the drawings,

[0018]FIG. 1 is a diagram of a system in which systems and methodsconsistent with the present invention may be implemented;

[0019]FIG. 2 is an exemplary diagram of the e-mail server of FIG. 1according to an implementation consistent with the principles of theinvention;

[0020]FIG. 3 is an exemplary functional block diagram of the e-mailserver of FIG. 2 according to an implementation consistent with theprinciples of the invention;

[0021]FIG. 4 is an exemplary diagram of the hash processing block ofFIG. 3 according to an implementation consistent with the principles ofthe invention; and

[0022] FIGS. 5A-5E are flowcharts of exemplary processing for detectingand/or preventing transmission of an unwanted e-mail message, such as ane-mail containing a virus or worm, including a polymorphic virus orworm, or an unsolicited commercial e-mail, according to animplementation consistent with the principles of the invention.

DETAILED DESCRIPTION

[0023] The following detailed description of the invention refers to theaccompanying drawings. The same reference numbers in different drawingsmay identify the same or similar elements. Also, the following detaileddescription does not limit the invention. Instead, the scope of theinvention is defined by the appended claims and equivalents.

[0024] Systems and methods consistent with the present invention providevirus, worm, and unsolicited e-mail detection and/or prevention ine-mail servers. Placing these features in e-mail servers provides anumber of new advantages, including the ability to align hash blocks tocrucial boundaries found in e-mail messages and eliminate certaincounter-measures by the attacker, such as using small Internet Protocol(IP) fragments to limit the detectable content in each packet. It alsoallows these features to relate e-mail header fields with thepotentially-harmful segment of the message (usually an “attachment”),and decode common file-packing and encoding formats that might otherwisemake a virus or worm undetectable by the packet-based technique (e.g.,“.zip files”).

[0025] By placing these features within an e-mail server, the ability todetect replicated content in the network at points where largequantities of traffic are present is obtained. By relating manyotherwise-independent messages and finding common factors, the e-mailserver may detect unknown, as well as known, viruses and worms. Thesefeatures may also be applied to detect potential unsolicited commerciale-mail (“spam”).

[0026] E-mail servers for major Internet Service Providers (ISPs) mayprocess a million e-mail messages a day, or more, in a single server.When viruses and worms are active in the network, a substantial fractionof this e-mail may actually be traffic generated by the virus or worm.Thus, an e-mail server may have dozens to thousands of examples of asingle e-mail-borne virus pass through it in a day, offering anexcellent opportunity to determine the relationships between e-mailmessages and detect replicated content (a feature that is indicative ofvirus/worm propagation) and spam, among other, more legitimate traffic(such as traffic from legitimate mailing lists).

[0027] Systems and methods consistent with the principles of theinvention provide mechanisms to detect and stop e-mail-borne viruses andworms before the addressed user receives them, in an environment wherethe virus is still inert. Current e-mail servers do not normally executeany code in the e-mail being transported, so they are not usuallysubject to virus/worm infections from the content of the e-mails theyprocess—though, they may be subject to infection via other forms ofattack.

[0028] Besides e-mail-borne viruses and worms, another common problemfound in e-mail is mass-e-mailing of unsolicited commercial e-mail,colloquially referred to as “spam.” It is estimated that perhaps 25%-50%of all e-mail messages now received for delivery by major ISP e-mailservers is spam.

[0029] Users of network e-mail services are desirous of mechanisms toblock e-mail containing viruses or worms from reaching their machines(where the virus or worm may easily do harm before the user realizes itspresence). Users are also desirous of mechanisms to block unsolicitedcommercial e-mail that consumes their time and resources.

[0030] Many commercial e-mail services put a limit on each user's e-mailaccumulating at the server, and not yet downloaded to the customer'smachine. If too much e-mail arrives between times when the user readshis e-mail, additional e-mail is either “bounced” (i.e., returned to thesender's e-mail server) or even simply discarded, both of which eventscan seriously inconvenience the user. Because the user has no controlover arriving e-mail due to e-mail-borne viruses/worms, or spam, it is arelatively common occurrence that the user's e-mail quota overflows dueto unwanted and potentially harmful messages. Similarly, the authors ofe-mail-borne viruses, as well as senders of spam, have no reason tolimit the size of their messages. As a result, these messages are oftenmuch larger than legitimate e-mail messages, thereby increasing therisks of such denial of service to the user by overflowing the per-usere-mail quota.

[0031] Users are not the only group inconvenienced by spam ande-mail-borne viruses and worms. Because these types of unwanted e-mailcan form a substantial fraction, even a majority, of e-mail traffic inthe Internet, for extended periods of time, ISPs typically must addextra resources to handle a peak e-mail load that would otherwise beabout half as large. This ratio of unwanted-to-legitimate e-mail trafficappears to be growing daily. Systems and methods consistent with theprinciples of the invention provide mechanisms to detect and discardunwanted e-mail in network e-mail servers.

Exemplary System Configuration

[0032]FIG. 1 is a diagram of an exemplary system 100 in which systemsand methods consistent with the present invention may be implemented.System 100 includes mail clients 110 connected to a mail server 120 viaa network 130. Connections made in system 100 may be via wired,wireless, and/or optical communication paths. While FIG. 1 shows threemail clients 110 and a single mail server 120, there can be more orfewer clients and servers in other implementations consistent with theprinciples of the invention.

[0033] Network 130 may facilitate communication between mail clients 110and mail server 120. Typically, network 130 may include a collection ofnetwork devices, such as routers or switches, that transfer data betweenmail clients 110 and mail server 120. In an implementation consistentwith the present invention, network 130 may take the form of a wide areanetwork, a local area network, an intranet, the Internet, a publictelephone network, a different type of network, or a combination ofnetworks.

[0034] Mail clients 110 may include personal computers, laptops,personal digital assistants, or other types of wired or wireless devicesthat are capable of interacting with mail server 120 to receive e-mails.In another implementation, clients 110 may include software operatingupon one of these devices. Client 110 may present e-mails to a user viaa graphical user interface.

[0035] Mail server 120 may include a computer or another device that iscapable of providing e-mail services for mail clients 110. In anotherimplementation, server 120 may include software operating upon one ofthese devices.

[0036]FIG. 2 is an exemplary diagram of mail server 120 according to animplementation consistent with the principles of the invention. Server120 may include bus 210, processor 220, main memory 230, read onlymemory (ROM) 240, storage device 250, input device 260, output device270, and communication interface 280. Bus 210 permits communicationamong the components of server 120.

[0037] Processor 220 may include any type of conventional processor ormicroprocessor that interprets and executes instructions. Main memory230 may include a random access memory (RAM) or another type of dynamicstorage device that stores information and instructions for execution byprocessor 220. ROM 240 may include a conventional ROM device or anothertype of static storage device that stores static information andinstructions for use by processor 220. Storage device 250 may include amagnetic and/or optical recording medium and its corresponding drive.

[0038] Input device 260 may include one or more conventional mechanismsthat permit an operator to input information to server 120, such as akeyboard, a mouse, a pen, voice recognition and/or biometric mechanisms,etc. Output device 270 may include one or more conventional mechanismsthat output information to the operator, such as a display, a printer, apair of speakers, etc. Communication interface 280 may include anytransceiver-like mechanism that enables server 120 to communicate withother devices and/or systems. For example, communication interface 280may include mechanisms for communicating with another device or systemvia a network, such as network 130.

[0039] As will be described in detail below, server 120, consistent withthe present invention, provides e-mail services to clients 110, whiledetecting unwanted e-mails and/or preventing unwanted e-mails fromreaching clients 110. Server 120 may perform these tasks in response toprocessor 220 executing sequences of instructions contained in, forexample, memory 230. These instructions may be read into memory 230 fromanother computer-readable medium, such as storage device 250 or acarrier wave, or from another device via communication interface 280.

[0040] Execution of the sequences of instructions contained in memory230 may cause processor 220 to perform processes that will be describedlater. Alternatively, hardwired circuitry may be used in place of or incombination with software instructions to implement processes consistentwith the present invention. Thus, processes performed by server 120 arenot limited to any specific combination of hardware circuitry andsoftware.

[0041]FIG. 3 is an exemplary functional block diagram of mail server 120according to an implementation consistent with the principles of theinvention. Server 120 may include a Simple Mail Transfer Protocol (SMTP)block 310, a Post Office Protocol (POP) block 320, an Internet MessageAccess Protocol (IMAP) block 330, and a hash processing block 340.

[0042] SMTP block 310 may permit mail server 120 to communicate withother mail servers connected to network 130 or another network. SMTP isdesigned to efficiently and reliably transfer e-mail across networks.SMTP defines the interaction between mail servers to facilitate thetransfer of e-mail even when the mail servers are implemented ondifferent types of computers or running different operating systems.

[0043] POP block 320 may permit mail clients 110 to retrieve e-mail frommail server 120. POP block 320 may be designed to always receiveincoming e-mail. POP block 320 may then hold e-mail for mail clients 110until mail clients 110 connect to download them.

[0044] IMAP block 330 may provide another mechanism by which mailclients 110 can retrieve e-mail from mail server 120. IMAP block 330 maypermit mail clients 110 to access remote e-mail as if the e-mail waslocal to mail clients 1110.

[0045] Hash processing block 340 may interact with SMTP block 310, POPblock 320, and/or IMAP block 330 to detect and prevent transmission ofunwanted e-mail, such as e-mails containing viruses or worms andunsolicited commercial e-mail (spam).

[0046]FIG. 4 is an exemplary diagram of hash processing block 340according to an implementation consistent with the principles of theinvention. Hash processing block 340 may include hash processor 410 andone or more hash memories 420. Hash processor 410 may include aconventional processor, an application specific integrated circuit(ASIC), a field-programmable gate array (FPGA), or some other type ofdevice that generates one or more representations for each receivede-mail and records the e-mail representations in hash memory 420.

[0047] An e-mail representation will likely not be a copy of the entiree-mail, but rather it may include a portion of the e-mail or some uniquevalue representative of the e-mail. For example, a fixed width numbermay be computed across portions of the e-mail in a manner that allowsthe entire e-mail to be identified.

[0048] To further illustrate the use of representations, a 32-bit hashvalue, or digest, may be computed across portions of each e-mail. Thenthe hash value may be stored in hash memory 420 or may be used as anindex, or address, into hash memory 420. Using the hash value, or anindex derived therefrom, results in efficient use of hash memory 420while still allowing the content of each e-mail passing through mailserver 120 to be identified.

[0049] Systems and methods consistent with the present invention may useany storage scheme that records information about one or more portionsof each e-mail in a space-efficient fashion, that can definitivelydetermine if a portion of an e-mail has not been observed, and that canrespond positively (i.e., in a predictable way) when a portion of ane-mail has been observed. Although systems and methods consistent withthe present invention can use virtually any technique for derivingrepresentations of portions of e-mails, the remaining discussion willuse hash values as exemplary representations of portions of e-mailsreceived by mail server 120.

[0050] In implementations consistent with the principles of theinvention, hash processor 410 may hash one or more portions of areceived e-mail to produce a hash value used to facilitate hash-baseddetection. For example, hash processor 410 may hash one or more of themain text within the message body, any attachments, and one or moreheader fields, such as sender-related fields (e.g., “From:,” “Sender:,”“Reply-To:,” “Return-Path:,” and “Error-To:”). Hash processor 410 mayperform one or more hashes on each of the e-mail portions using the sameor different hash functions.

[0051] As described in more detail below, hash processor 410 may use thehash results of the hash operation to recognize duplicate occurrences ofe-mails and raise a warning if the duplicate e-mail occurrences arrivewithin a short period of time and raise their level of suspicion abovesome threshold. It may also be possible to use the hash results fortracing the path of an unwanted e-mail through the network.

[0052] Each hash value may be determined by taking an input block ofdata and processing it to obtain a numerical value that represents thegiven input data. Suitable hash functions are readily known in the artand will not be discussed in detail herein. Examples of hash functionsinclude the Cyclic Redundancy Check (CRC) and Message Digest 5 (MD5).The resulting hash value, also referred to as a message digest or hashdigest, may include a fixed length value. The hash value may serve as asignature for the data over which it was computed.

[0053] The hash value essentially acts as a fingerprint identifying theinput block of data over which it was computed. Unlike fingerprints,however, there is a chance that two very different pieces of data willhash to the same value, resulting in a hash collision. An acceptablehash function should provide a good distribution of values over avariety of data inputs in order to prevent these collisions. Becausecollisions occur when different input blocks result in the same hashvalue, an ambiguity may arise when attempting to associate a result witha particular input.

[0054] Hash processor 410 may store a representation of each e-mail itobserves in hash memory 420. Hash processor 410 may store the actualhash values as the e-mail representations or it may use other techniquesfor minimizing storage requirements associated with retaining hashvalues and other information associated therewith. A technique forminimizing storage requirements may use one or more arrays or Bloomfilters.

[0055] Rather than storing the actual hash value, which can typically beon the order of 32 bits or more in length, hash processor 410 may usethe hash value as an index for addressing an array within hash memory420. In other words, when hash processor 410 generates a hash value fora portion of an e-mail, the hash value serves as the address locationinto the array. At the address corresponding to the hash value, a countvalue may be incremented at the respective storage location, thus,indicating that a particular hash value, and hence a particular e-mailportion, has been seen by hash processor 410. In one implementation, thecount value is associated with an 8-bit counter with a maximum valuethat sticks at 255. While counter arrays are described by way ofexample, it will be appreciated by those skilled in the relevant art,that other storage techniques may be employed without departing from thespirit of the invention.

[0056] Hash memory 420 may store a suspicion count that is used todetermine the overall suspiciousness of an e-mail message. For example,the count value (described above) may be compared to a threshold, andthe suspicion count for the e-mail may be incremented if the thresholdis exceeded. Hence, there may be a direct relationship between the countvalue and the suspicion count, and it may be possible for the two valuesto be the same. The larger the suspicion count, the more important thehit should be considered in determining the overall suspiciousness ofthe packet. Alternatively, the suspicion count can be combined in a“scoring function” with values from this or other hash blocks in thesame message in order to determine whether the message should beconsidered suspicious.

[0057] It is not enough, however, for hash memory 420 to simply identifythat an e-mail contains content that has been seen recently. There aremany legitimate sources (e.g., e-mail list servers) that producemultiple copies of the same message, addressed to multiple recipients.Similarly, individual users often e-mail messages to a group of peopleand, thus, multiple copies might be seen if several recipients happen toreceive their mail from the same server. Also, people often forwardcopies of received messages to friends or co-workers.

[0058] In addition, virus/worm authors typically try to minimize thereplicated content in each copy of the virus/worm, in order to not bedetected by existing virus and worm detection technology that depends ondetecting fixed sequences of bytes in a known virus or worm. Thesemutable viruses/worms are usually known as polymorphic, and theattacker's goal is to minimize the recognizability of the virus or wormby scrambling each copy in a different way. For the virus or worm toremain viable, however, a small part of it can be mutable in only arelatively small number of ways, because some of its code must beimmediately-executable by the victim's computer, and that limits themutation and obscurement possibilities for the critical initial codepart.

[0059] In order to accomplish the proper classification of various typesof legitimate and unwanted e-mail messages, multiple hash memories 420can be employed, with separate hash memories 420 being used for specificsub-parts of a standard e-mail message. The outputs of different ones ofhash memories 420 can then be combined in an overall “scoring” orclassification function to determine whether the message is undesirableor legitimate, and possibly estimate the probability that it belongs toa particular class of traffic, such as a virus/worm message, spam,e-mail list message, normal user-to-user message.

[0060] For e-mail following the Internet mail standard RFC 822 (and itsvarious extensions), hashing of certain individual e-mail header fieldsinto field-specific hash memories 420 may be useful. Among the headerfields for which this may be helpful are: (1) various sender-relatedfields, such as “From:”, “Sender:”, “Reply-To:”, “Return-Path:” and“Error-To:”; (2) the “To:” field (often a fixed value for a mailinglist, frequently missing or idiosyncratic in spam messages); and (3) thelast few “Received:” headers (i.e., the earliest ones, since they arenormally added at the top of the message), excluding any obvioustimestamp data. It may also be useful to hash a combination of the“From:” field and the e-mail address of the recipient (transferred aspart of the SMTP mail-transfer protocol, and not necessarily found inthe message itself).

[0061] Any or all of hash memories 420 may be pre-loaded with knowledgeof known good or bad traffic. For example, known viruses and spamcontent (e.g., the infamous “Craig Shergold letter” or many pyramidswindle letters) can be pre-hashed into the relevant hash memories 420,and/or periodically refreshed in the memory as part of a periodic“cleaning” process described below. Also, known legitimate mailinglists, such as mailing lists from legitimate e-mail list servers, can beadded to a “From:” hash memory 420 that passes traffic without furtherexamination.

[0062] Over time, hash memories 420 may fill up and the possibility ofoverflowing an existing count value increases. The risk of overflowing acount value may be reduced if the counter arrays are periodicallyflushed to other storage media, such as a magnetic disk drive, opticalmedia, solid state drive, or the like. Alternatively, the counter arraysmay be slowly and incrementally erased. To facilitate this, a time-tablemay be established for flushing/erasing the counter arrays. If desired,the flushing/erasing cycle can be reduced by computing hash values onlyfor a subset of the e-mails received by mail server 120. While thisapproach reduces the flushing/erasing cycle, it increases thepossibility that a target e-mail may be missed (i.e., a hash value isnot computed over a portion of it).

[0063] Non-zero storage locations within hash memories 420 may bedecremented periodically rather than being erased. This may ensure thatthe “random noise” from normal e-mail traffic would not remain in acounter array indefinitely. Replicated traffic (e.g., e-mails containinga virus/worm that are propagating repeatedly across the network),however, would normally cause the relevant storage locations to staysubstantially above the “background noise” level.

[0064] One way to decrement the count values in the counter array fairlyis to keep a total count, for each hash memory 420, of every time one ofthe count values is incremented. After this total count reaches somethreshold value (probably in the millions), for every time a count valueis incremented in hash memory 420, another count value gets decremented.One way to pick the count value to decrement is to keep a counter, as adecrement pointer, that simply iterates through the storage locationssequentially. Every time a decrement operation is performed, thefollowing may done: (a) examine the candidate count value to bedecremented and if non-zero, decrement it and increment the decrementpointer to the next storage location; and (b) if the candidate countvalue is zero, then examine each sequentially-following storage locationuntil a non-zero count value is found, decrement that count value, andadvance the decrement pointer to the following storage location.

[0065] It may be important to avoid decrementing any counters belowzero, while not biasing decrements unfairly. Because it may be assumedthat the hash is random, this technique should not favor any particularstorage location, since it visits each of them before starting over.This technique may be superior to a timer-based decrement because itkeeps a fixed total count population across all of the storagelocations, representing the most recent history of traffic, and is notsubject to changes in behavior as the volume of traffic varies overtime.

[0066] A variation of this technique may include randomly selecting acount value to decrement, rather than processing them cyclically. Inthis variation, if the chosen count value is already zero, then anotherone could be picked randomly, or the count values in the storagelocations following the initially-chosen one could be examined inseries, until a non-zero count value is found.

Exemplary Processing For Unwanted E-Mail Detection/Prevention

[0067] FIGS. 5A-5E are flowcharts of exemplary processing for detectingand/or preventing transmission of unwanted e-mail, such as an e-mailcontaining a virus or worm, including a polymorphic virus or worm, or anunsolicited commercial e-mail (spam), according to an implementationconsistent with the principles of the invention. The processing of FIGS.5A-5E will be described in terms of a series of acts that may beperformed by mail server 120. In implementations consistent with theprinciples of the invention, some of the acts may be optional and/orperformed in an order different than that described. In otherimplementations, different acts may be substituted for described acts oradded to the process.

[0068] Processing may begin when hash processor 410 (FIG. 4) receives,or otherwise observes, an e-mail message (act 502) (FIG. 5A). Hashprocessor 410 may hash the main text of the message body, excluding anyattachments (act 504). When hashing the main text, hash processor 410may perform one or more conventional hashes covering one or moreportions, or all, of the main text. For example, hash processor 410 mayperform hash functions on fixed or variable sized blocks of the maintext. It may be beneficial for hash processor 410 to perform multiplehashes on each of the blocks using the same or different hash functions.

[0069] It may be desirable to pre-process the main text to removeattempts to fool pattern-matching mail filters. An example of this isHyperText Markup Language (HTML) e-mail, where spammers often insertrandom text strings in HTML comments between or within words of thetext. Such e-mail may be referred to as “polymorphic spam” because itattempts to make each message appear unique. This method for evadingdetection might otherwise defeat the hash detection technique, or otherstring-matching techniques. Thus, removing all HTML comments from themessage before hashing it may be desirable. It might also be useful todelete HTML tags from the message, or apply other specialized, butsimple, pre-processing techniques to remove content not actuallypresented to the user. In general, this may be done in parallel with thehashing of the message text, since viruses and worms may be hidden inthe non-visible content of the message text.

[0070] Hash processor 410 may also hash any attachments, after firstattempting to expand them if they appear to be known types of compressedfiles (e.g., “zip” files) (act 506). When hashing an attachment, hashprocessor 410 may perform one or more conventional hashes covering oneor more portions, or all, of the attachment. For example, hash processor410 may perform hash functions on fixed or variable sized blocks of theattachment. It may be beneficial for hash processor 410 to performmultiple hashes on each of the blocks using the same or different hashfunctions.

[0071] Hash processor 410 may compare the main text and attachmenthashes with known viruses, worms, or spam content in a hash memory 420that is pre-loaded with information from known viruses, worms, and spamcontent (acts 508 and 510). If there are any hits in this hash memory420, there is a probability that the e-mail message contains a virus orworm or is spam. A known polymorphic virus may have only a small numberof hashes that match in this hash memory 420, out of the total number ofhash blocks in the message. A non-polymorphic virus may have a very highfraction of the hash blocks hit in hash memory 420. For this reason,storage locations within hash memory 420 that contain entries frompolymorphic viruses or worms may be given more weight during thepre-loading process, such as by giving them a high initial suspicioncount value.

[0072] A high fraction of hits in this hash memory 420 may cause themessage to be marked as a probable known virus/worm or spam. In thiscase, the e-mail message can be sidetracked for remedial action, asdescribed below.

[0073] A message with a significant “score” from polymorphic virus/wormhash value hits may or may not be a virus/worm instance, and may besidetracked for further investigation, or marked as suspicious beforeforwarding to the recipient. An additional check may also be made todetermine the level of suspicion.

[0074] For example, hash processor 410 may hash a concatenation of theFrom and To header fields of the e-mail message (act 512) (FIG. 5B).Hash processor 410 may then check the suspicion counts in hash memories420 for the hashes of the main text, any attachments, and theconcatenated From/To (act 514). Hash processor 410 may determine whetherthe main text or attachment suspicion count is significantly higher thanthe From/To suspicion count (act 516). If so, then the content isappearing much more frequently outside the messages between this set ofusers (which might otherwise be due to an e-mail exchange with repeatedmessage quotations) and, thus, is much more suspicious.

[0075] When this occurs, hash processor 410 may take remedial action(act 518). The remedial action taken might take different forms, whichmay be programmable or determined by an operator of mail server 120. Forexample, hash processor 410 may discard the e-mail. This is notrecommended for anything but virtually-certain virus/worm/spamidentification, such as a perfect match to a known virus.

[0076] As an alternate technique, hash processor 410 may mark the e-mailwith a warning in the message body, in an additional header, or otheruser-visible annotation, and allow the user to deal with it when it isdownloaded. For data that appears to be from an unknown mailing list, avariant of this option is to request the user to send back a replymessage to the server, classifying the suspect message as either spam ora mailing list. In the latter case, the mailing list source address canbe added to the “known legitimate mailing lists” hash memory 420.

[0077] As another technique, hash processor 410 may subject the e-mailto more sophisticated (and possibly more resource-consuming) detectionalgorithms to make a more certain determination. This is recommended forpotential unknown viruses/worms or possible detection of a polymorphicvirus/worm.

[0078] As yet another technique, hash processor 410 may hold the e-mailmessage in a special area and create a special e-mail message to notifythe user of the held message (probably including From and Subjectfields). Hash processor 410 may also give instructions on how toretrieve the message.

[0079] As a further technique, hash processor 410 may mark the e-mailmessage with its suspicion score result, but leave it queued for theuser's retrieval. If the user's quota would overflow when a new messagearrives, the score of the incoming message and the highest score of thequeued messages are compared. If the highest queued message has a scoreabove a settable threshold, and the new message's score is lower thanthe threshold, the queued message with the highest score may be deletedfrom the queue to make room for the new message. Otherwise, if the newmessage has a score above the threshold, it may be discarded or“bounced” (e.g., the sending e-mail server is told to hold the messageand retry it later). Alternatively, if it is desired to never bounceincoming messages, mail server 120 may accept the incoming message intothe user's queue and repeatedly delete messages with the highestsuspicion score from the queue until the total is below the user's quotaagain.

[0080] As another technique, hash processor 410 may apply hash-basedfunctions as the e-mail message starts arriving from the sending serverand determine the message's suspicion score incrementally as the messageis read in. If the message has a high-enough suspicion score (above athreshold) during the early part of the message, mail server 120 mayreject the message, optionally with either a “retry later” or a“permanent refusal” result to the sending server (which one is used maybe determined by settable thresholds applied to the total suspicionscore, and possibly other factors, such as server load). This results inthe unwanted e-mail using up less network bandwidth and receiving serverresources, and penalizes servers sending unwanted mail, relative tothose that do not.

[0081] If the suspicion count for the main text or any attachment is notsignificantly higher than the From/To suspicion count (act 516), hashprocessor 410 may determine whether the main text or any attachment hassignificant replicated content (non-zero or high suspicion count valuesfor many hash blocks in the text/attachment content in all storagelocations of hash memories 420) (act 520) (FIG. 5A). If not, the messageis probably a normal user-to-user e-mail. These types of messages may be“passed” without further examination. When appropriate, hash processor410 may also record the generated hash values by incrementing thesuspicion count value in the corresponding storage locations in hashmemory 420.

[0082] If the message text is substantially replicated (e.g., greaterthan 90%), hash processor 410 may check one or more portions of thee-mail message against known legitimate mailing lists within hash memory420 (act 522) (FIG. 5C). For example, hash processor 410 may hash theFrom or Sender fields of the e-mail message and compare it/them to knownlegitimate mailing lists within hash memory 420. Hash processor 410 mayalso determine whether the e-mail actually appears to originate from thecorrect source for the mailing list by examining, for example, thesequence of Received headers. Hash processor 410 may further examine acombination of the From or Sender fields and the recipient address todetermine if the recipient has previously received e-mail from thesender. This is typical for mailing lists, but a typical of unwantede-mail, which will normally not have access to the actual list ofrecipients for the mailing list. Failure of this examination may simplypass the message on, but mark it as “suspicious,” since the recipientmay simply be a new subscriber to the mailing list, or the mailings maybe infrequent enough to not persist in the hash counters betweenmailings.

[0083] If there is a match with a legitimate mailing list (act 524),then the message is probably a legitimate mailing list duplicate and maybe passed with no further examination. This assumes that the mailinglist server employs some kind of filtering to exclude unwanted e-mail(e.g., refusing to forward e-mail that does not originate with a knownlist recipient or refusing e-mail with attachments).

[0084] If there is no match with any legitimate mailing lists withinhash memory 420, hash processor 410 may hash the sender-related fields(e.g., From, Sender, Reply-To) (act 526). Hash processor 410 may thendetermine the suspicion count for the sender-related hashes in hashmemories 420 (act 528).

[0085] Hash processor 410 may determine whether the suspicion counts forthe sender-related hashes are similar to the suspicion count(s) for themain text hash(es) (act 530) (FIG. 5D). If both From and Sender fieldsare present, then the Sender field should match with roughly the samesuspicion count value as the message body hash. The From field may ormay not match. For a legitimate mailing list, it may be a legitimatemailing list that is not in the known legitimate mailing lists hashmemory 420 (or in the case where there is no known legitimate mailinglists hash memory 420). If only the From field is present, it shouldmatch about as well as the message text for a mailing list. If none ofthe sender-related fields match as well as the message text, the e-mailmessage may be considered moderately suspicious (probably spam, with avariable and fictitious From address or the like).

[0086] As an additional check, hash processor 410 may hash theconcatenation of the sender-related field with the highest suspicioncount value and the e-mail recipient's address (act 532). Hash processor410 may then check the suspicion count for the concatenation in a hashmemory 420 used just for this check (act 534). If it matches with asignificant suspicion count value (act 536) (FIG. 5E), then therecipient has recently received multiple messages from this source,which makes it probable that it is a mailing list. The e-mail messagemay then be passed without further examination.

[0087] If the message text or attachments are mostly replicated (e.g.,greater than 90% of the hash blocks), but with mostly low suspicioncount values in hash memory 420 (act 538), then the message is probablya case of a small-scale replication of a single message to multiplerecipients. In this case, the e-mail message may then be passed withoutfurther examination.

[0088] If the message text or attachments contain some significantdegree of content replication (say, greater than 50% of the hash blocks)and at least some of the hash values have high suspicion count values inhash memory 420 (act 540), then the message is fairly likely to be avirus/worm or spam. A virus or worm should be considered more likely ifthe high-count matches are in an attachment. If the highly-replicatedcontent is in the message text, then the message is more likely to bespam, though it is possible that e-mail text employing a scriptinglanguage (e.g., Java script) might also contain a virus.

[0089] If the replication is in the message text, and the suspicioncount is substantially higher for the message text than for the Fromfield, the message is likely to be spam (because spammers generally varythe From field to evade simpler spam filters). A similar check can bemade for the concatenation of the From and To header fields, except thatin this case, it is most suspicious if the From/To hash misses (finds azero suspicion count), indicating that the sender does not ordinarilysend e-mail to that recipient, making it unlikely to be a mailing list,and very likely to be a spammer (because they normally employ random orfictitious From addresses).

[0090] In the above cases, hash processor 410 may take remedial action(act 542). The particular type of action taken by hash processor 410 mayvary as described above.

CONCLUSION

[0091] Systems and methods consistent with the present invention providemechanisms within an e-mail server to detect and/or prevent transmissionof unwanted e-mail, such as e-mail containing viruses or worms,including polymorphic viruses and worms, and unsolicited commerciale-mail (spam).

[0092] Implementation of a hash-based detection mechanism in an e-mailserver at the e-mail message level provides advantages over apacket-based implementation in a router or other network node device.For example, the entire e-mail message has been re-assembled, both atthe packet level (i.e., IP fragment re-assembly) and at the applicationlevel (multiple packets into a complete e-mail message). Also, thehashing algorithm can be applied more intelligently to specific parts ofthe e-mail message (e.g., header fields, message body, and attachments).Attachments that have been compressed for transport (e.g., “.zip” files)can be expanded for inspection. Without doing this, a polymorphic viruscould easily hide inside such files with no repeatable hash signaturevisible at the packet transport level.

[0093] With the entire message available for a single pass of thehashing process, packet boundaries and packet fragmentation do not splitsequences of bytes that might otherwise provide useful hash signatures.A clever attacker might otherwise obscure a virus/worm attack by causingthe IP packets carrying the malicious code to be fragmented into piecessmaller than that for which the hashing process is effective, or byforcing packet breaks in the middle of otherwise-visible fixed sequencesof code in the virus/worm. Also, the entire message is likely to belonger than a single packet, thereby reducing the probability of falsealarms (possibly due to insufficient hash-block sample size and too fewhash blocks per packet) and increasing the probability of correctidentification of a virus/worm (more hash blocks will match per messagethan per packet, since packets will be only parts of the entiremessage).

[0094] Also, fewer hash-block alignment issues arise when the hashblocks can be intelligently aligned with fields of the e-mail message,such as the start of the message body, or the start of an attachmentblock. This results in faster detection of duplicate contents than ifthe blocks are randomly aligned (as is the case when the method isapplied to individual packets).

[0095] E-mail-borne malicious code, such as viruses and worms, alsousually includes a text message designed to cause the user to read themessage and/or perform some other action that will activate themalicious code. It is harder for such text to be polymorphic, becauseautomatic scrambling of the user-visible text will either render itsuspicious-looking, or will be very limited in variability. This fact,combined with the ability to start a hash block at the start of themessage text by parsing the e-mail header, reduces the variability inhash signatures of the message, making it easier to detect with fewerexamples seen.

[0096] Further, the ability to extract and hash specific headers from ane-mail message separately may be used to help classify the type ofreplicated content the message body carries. Because many legitimatecases of message replication exist (e.g., topical mailing lists, such asYahoo Groups), intelligent parsing and hashing of the message headers isvery useful to reduce the false alarm rate, and to increase the accuracyof detection of real viruses, worms, and spam.

[0097] This detection technique, compared to others which might extractand save fixed strings to be searched for in other pieces of e-mail,includes hash-based filters that are one-way functions (i.e., it ispossible, given a piece of text, to determine if it has been seen beforein another message). Given the state data contained in the filter,however, it is virtually impossible to reconstruct a prior message, orany piece of a prior message, that has been passed through the filterpreviously. Thus, this technique can maintain the privacy of e-mail,without retaining any information that can be attributed to a specificsender or receiver.

[0098] The foregoing description of preferred embodiments of the presentinvention provides illustration and description, but is not intended tobe exhaustive or to limit the invention to the precise form disclosed.Modifications and variations are possible in light of the aboveteachings or may be acquired from practice of the invention.

[0099] For example, systems and methods have been described with regardto a mail server. In other implementations, the systems and methodsdescribed herein may be used within other devices, such as a mailclient. In such a case, the mail client may periodically obtainsuspicion count values for its hash memory from one or more networkdevices, such as a mail server.

[0100] It may be possible for multiple mail servers to work together todetect and prevent unwanted e-mails. For example, high-scoring entriesfrom the hash memory of one mail server might be distributed to othermail servers, as long as the same hash functions are used by the samecooperating servers. This may accelerate the detection process,especially for mail servers that experience relatively low volumes oftraffic.

[0101] Further, certain portions of the invention have been described as“blocks” that perform one or more functions. These blocks may includehardware, such as an ASIC or a FPGA, software, or a combination ofhardware and software.

[0102] No element, act, or instruction used in the description of thepresent application should be construed as critical or essential to theinvention unless explicitly described as such. Also, as used herein, thearticle “a” is intended to include one or more items. Where only oneitem is intended, the term “one” or similar language is used. The scopeof the invention is defined by the claims and their equivalents.

What is claimed is:
 1. A method for detecting transmission ofpotentially unwanted e-mail messages, comprising: receiving a pluralityof e-mail messages; generating hash values, as generated hash values,based on one or more portions of the plurality of e-mail messages;determining whether the generated hash values match hash valuesassociated with prior e-mail messages; and determining that one of theplurality of e-mail messages is a potentially unwanted e-mail messagewhen one or more of the generated hash values associated with the one ofthe plurality of e-mail messages match one or more of the hash valuesassociated with the prior e-mail messages.
 2. The method of claim 1,wherein the generating hash values includes: performing a plurality ofhashes on a plurality of variable-sized blocks of a main text of theplurality of e-mail messages.
 3. The method of claim 1, wherein thegenerating hash values includes: performing a plurality of hashes on aplurality of fixed-sized blocks of a main text of the plurality ofe-mail messages.
 4. The method of claim 1, wherein the generating hashvalues includes: performing a plurality of hashes on a main text of theplurality of e-mail messages using a plurality of different hashfunctions.
 5. The method of claim 1, wherein the generating hash valuesincludes: performing a plurality of hashes on a main text of theplurality of e-mail messages using a same hash function.
 6. The methodof claim 1, wherein the generating hash values includes: attempting toexpand an attachment of the plurality of e-mail messages, and hashingthe attachment after attempting to expand the attachment.
 7. The methodof claim 1, wherein the generating hash values includes: performing aplurality of hashes on a plurality of variable-sized blocks of anattachment of the plurality of e-mail messages.
 8. The method of claim1, wherein the generating hash values includes: performing a pluralityof hashes on a plurality of fixed-sized blocks of an attachment of theplurality of e-mail messages.
 9. The method of claim 1, wherein thegenerating hash values includes: performing a plurality of hashes on anattachment of the plurality of e-mail messages using a plurality ofdifferent hash functions.
 10. The method of claim 1, wherein thegenerating hash values includes: performing a plurality of hashes on anattachment of the plurality of e-mail messages using a same hashfunction.
 11. The method of claim 1, further comprising: comparing thegenerated hash values to hash values corresponding to known unwantede-mails.
 12. The method of claim 11, wherein the known unwanted e-mailsinclude at least one of e-mails containing a virus, e-mails containing aworm, and unsolicited commercial e-mails.
 13. The method of claim 1,wherein the generating hash values includes: hashing at least one of amain text and an attachment to generate one or more first hash values,and hashing a concatenation of first and second header fields togenerate a second hash value.
 14. The method of claim 13, wherein thefirst and second header fields include a From header field and a Toheader field.
 15. The method of claim 13, wherein the determiningwhether the generated hash values match hash values associated withprior e-mail messages includes: determining a first suspicion countbased on a number of the hash values associated with the prior e-mailmessages that match the one or more first hash values, and determining asecond suspicion count based on a number of the hash values associatedwith the prior e-mail messages that match the second hash value.
 16. Themethod of claim 15, wherein the determining that one of the plurality ofe-mail messages is a potentially unwanted e-mail message includes:determining that the one of the plurality of e-mail messages is apotentially unwanted email message when the first suspicion count issignificantly higher than the second suspicion count.
 17. The method ofclaim 1, further comprising: taking remedial action when the one of theplurality of e-mail messages is a potentially unwanted e-mail message,the taking remedial action including at least one of: discarding the oneof the plurality of e-mail messages, bouncing the one of the pluralityof e-mail messages, marking the one of the plurality of e-mail messageswith a warning, subjecting the one of the plurality of e-mail messagesto a virus or worm detection process, creating a notification message,and generating a suspicion score for the one of the plurality of e-mailmessages and using the suspicion score to identify further processingfor the one of the plurality of e-mail messages.
 18. The method of claim1, further comprising: generating a suspicion score for the plurality ofe-mail messages based on a result of the determination of whether thegenerated hash values match hash values associated with prior e-mailmessages; and taking remedial action when the one of the plurality ofe-mail messages is a potentially unwanted e-mail message, the takingremedial action including: determining whether a newly received e-mailmessage exceeds a mail quota, identifying an earlier-received e-mailmessage with a highest suspicion score, determining whether thesuspicion score of the newly received e-mail message is lower than thesuspicion score of the earlier-received e-mail message when the newlyreceived e-mail message exceeds the mail quota, deleting theearlier-received e-mail message when the suspicion score of the newlyreceived e-mail message is lower than the suspicion score of theearlier-received e-mail message, and storing the newly received e-mailmessage.
 19. The method of claim 1, wherein the generating hash valuesand the determining whether the hash values match hash values associatedwith prior e-mail messages are performed incrementally as the pluralityof e-mail messages are being received.
 20. The method of claim 19,further comprising: generating a suspicion score for the plurality ofe-mail messages based on a result of the determination of whether thegenerated hash values match hash values associated with prior e-mailmessages; and taking remedial action when the suspicion score of ane-mail message of the plurality of e-mail messages is above a threshold,the taking remedial action including rejecting the e-mail message. 21.The method of claim 20, wherein the rejecting occurs before the e-mailmessage is completely received.
 22. The method of claim 1, furthercomprising: comparing the generated hash values to known legitimatemailing lists; and passing the plurality of e-mail messages withoutfurther examination when the generated hash values match one or more ofthe known legitimate mailing lists.
 23. The method of claim 22, whereinthe comparing the generated hash values includes: determining whetherthe plurality of e-mail messages originated from the known legitimatemailing lists.
 24. The method of claim 1, wherein the generating hashvalues includes: hashing a main text to generate a first hash value, andhashing sender-related header fields to generate one or more second hashvalues.
 25. The method of claim 24, wherein the sender-related headerfields include at least one of a From header field, a Sender headerfield, and a Reply-To header field.
 26. The method of claim 24, whereinthe determining whether the generated hash values match hash valuesassociated with prior e-mail messages includes: determining a firstsuspicion count based on a number of the hash values associated with theprior e-mail messages that match the first hash value, and determiningone or more second suspicion counts based on a number of the hash valuesassociated with the prior e-mail messages that match the one or moresecond hash values.
 27. The method of claim 26, wherein the determiningthat one of the plurality of e-mail messages is a potentially unwantede-mail message includes: determining that the one of the plurality ofe-mail messages is a potentially unwanted e-mail message when the firstsuspicion count is higher than the one or more second suspicion counts.28. The method of claim 1, wherein the generating hash values includes:hashing a main text of the plurality of e-mail messages to generate amain text hash, and hashing at least one header field of the pluralityof e-mail messages to generate at least one header hash.
 29. The methodof claim 28, wherein the determining whether the generated hash valuesmatch hash values associated with prior e-mail messages includes:determining whether the main text hash matches a substantially highernumber of the hash values associated with the prior e-mail messages thanthe at least one header hash; and wherein the determining that one ofthe plurality of e-mail messages is a potentially unwanted e-mailmessage includes: determining that the one of the plurality of e-mailmessages is a potentially unwanted e-mail message when the main texthash matches a substantially higher number of the hash values associatedwith the prior e-mail messages than the at least one header hash.
 30. Asystem for detecting transmission of potentially unwanted e-mails,comprising: means for observing a plurality of e-mails; means forhashing one or more portions of the plurality of e-mails to generatehash values, as generated hash values; means for determining whether thegenerated hash values match hash values associated with prior e-mails;and means for determining that the plurality of e-mails are potentiallyunwanted e-mails when one or more of the generated hash values match oneor more of the hash values associated with the prior e-mails.
 31. A mailserver, comprising: one or more hash memories configured to store countvalues associated with a plurality of hash values; and a hash processorconfigured to: receive an e-mail message, hash one or more portions ofthe e-mail message to generate hash values, as generated hash values,increment the count values corresponding to the generated hash values,as incremented count values, and determine whether the e-mail message isa potentially unwanted e-mail message based on the incremented countvalues.
 32. The server of claim 31, wherein when hashing one or moreportions of the e-mail message, the hash processor is configured toperform a plurality of hashes on a plurality of variable-sized blocks ofa main text of the e-mail message.
 33. The server of claim 31, whereinwhen hashing one or more portions of the e-mail message, the hashprocessor is configured to perform a plurality of hashes on a pluralityof fixed-sized blocks of a main text of the e-mail message.
 34. Theserver of claim 31, wherein when hashing one or more portions of thee-mail message, the hash processor is configured to perform a pluralityof hashes on a main text of the e-mail message using a plurality ofdifferent hash functions.
 35. The server of claim 31, wherein whenhashing one or more portions of the e-mail message, the hash processoris configured to: attempt to expand an attachment of the e-mail message,and hash the attachment after attempting to expand the attachment. 36.The server of claim 31, wherein when hashing one or more portions of thee-mail message, the hash processor is configured to perform a pluralityof hashes on a plurality of variable-sized blocks of an attachment ofthe e-mail message.
 37. The server of claim 31, wherein when hashing oneor more portions of the e-mail message, the hash processor is configuredto perform a plurality of hashes on a plurality of fixed-sized blocks ofan attachment of the e-mail message.
 38. The server of claim 31, whereinwhen hashing one or more portions of the e-mail message, the hashprocessor is configured to perform a plurality of hashes on anattachment of the e-mail message using a plurality of different hashfunctions.
 39. The server of claim 31, wherein the hash processor isfurther configured to compare the generated hash values to hash valuescorresponding to known unwanted e-mails.
 40. The server of claim 39,wherein the known unwanted e-mails include at least one of e-mailscontaining a virus, e-mails containing a worm, and unsolicitedcommercial e-mails.
 41. The server of claim 31, wherein when hashing oneor more portions of the e-mail message, the hash processor is configuredto: hash at least one of a main text and an attachment of the e-mailmessage to generate one or more first hash values, and hash aconcatenation of first and second header fields of the e-mail message togenerate a second hash value.
 42. The server of claim 41, wherein thefirst and second header fields include a From header field and a Toheader field.
 43. The server of claim 41, wherein when determiningwhether the e-mail message is a potentially unwanted e-mail message, thehash processor is configured to identify the e-mail message as apotentially unwanted e-mail message when the count value correspondingto one or more first hash values is significantly higher than the countvalue corresponding to the second hash value.
 44. The server of claim31, wherein the hash processor is further configured to take remedialaction when the e-mail message is a potentially unwanted e-mail message,when taking remedial action, the hash processor is configured to atleast one of: discard the e-mail message, bounce the e-mail message,mark the e-mail message with a warning, subject the e-mail message to avirus or worm detection process, create a notification message, andgenerate a suspicion score for the e-mail message and use the suspicionscore to identify further processing for the e-mail message.
 45. Theserver of claim 31, wherein the hash processor is further configured to:generate a suspicion score for the e-mail message based on theincremented count values, determine whether a newly received e-mailmessage exceeds a mail quota, identify an earlier-received e-mailmessage with a highest suspicion score, determine whether a suspicionscore of the newly received e-mail message is lower than the suspicionscore of the earlier-received e-mail message when the newly receivede-mail message exceeds the mail quota, delete the earlier-receivede-mail message when the suspicion score of the newly received e-mailmessage is lower than the suspicion score of the earlier-received e-mailmessage, and store the newly received e-mail message.
 46. The server ofclaim 31, wherein the hash processor is configured to hash the one ormore portions of the e-mail message and increment the count valuesincrementally as the e-mail message is being received.
 47. The server ofclaim 46, wherein the hash processor is further configured to: generatea suspicion score for the e-mail message based on the incremented countvalues, reject the e-mail message when the suspicion score of the e-mailmessage is above a threshold.
 48. The server of claim 47, wherein therejecting occurs before the e-mail message is completely received. 49.The server of claim 31, wherein the hash processor is further configuredto: compare the generated hash values to known legitimate mailing lists,and pass the e-mail message without further examination when thegenerated hash values match one of the known legitimate mailing lists.50. The server of claim 49, wherein the hash processor is configured to:determine whether the e-mail message originated from one of the knownlegitimate mailing lists.
 51. The server of claim 31, wherein the hashprocessor is configured to: hash a main text of the e-mail message togenerate a first hash value, and hash sender-related header fields ofthe e-mail message to generate one or more second hash values.
 52. Theserver of claim 51, wherein the sender-related header fields include atleast one of a From header field, a Sender header field, and a Reply-Toheader field.
 53. The server of claim 51, wherein when determiningwhether the e-mail message is a potentially unwanted e-mail message, thehash processor is configured to identify the e-mail message as apotentially unwanted e-mail message when the count value correspondingto the first hash value is higher than the count values corresponding tothe one or more second hash values.
 54. The server of claim 31, whereinwhen hashing one or more portions of the e-mail message, the hashprocessor is configured to: perform a plurality of hashes on a main textof the e-mail message to generate main text hashes, and hash at leastone header field of the e-mail message to generate at least one headerhash.
 55. The server of claim 54, when determining whether the e-mailmessage is a potentially unwanted e-mail message, the hash processor isconfigured to: generate a score for the main text based on count valuescorresponding to the main text hashes and a score for the at least oneheader field based on the count value corresponding to the at least oneheader hash, and identify the e-mail message as a potentially unwantede-mail message when the score for the main text is substantially higherthan the score for the at least one header hash.
 56. A method fordetecting transmission of unwanted e-mail messages, comprising:receiving a plurality of e-mail messages; and detecting unwanted e-mailmessages from the plurality of e-mail messages based on hashes ofpreviously received e-mail messages, where multiple hashes are performedon each of the plurality of e-mail messages.
 57. A method for detectingtransmission of potentially unwanted e-mail messages, comprising:receiving an e-mail message; generating a plurality of hash values, asgenerated hash values, over blocks of the received e-mail message, theblocks including at least two of a main text portion, an attachmentportion, and a header portion of the received e-mail message;determining whether the generated hash values match hash valuesassociated with prior e-mail messages; and determining that the receivede-mail message is a potentially unwanted e-mail message when one or moreof the generated hash values associated with the received e-mail messagematch one or more of the hash values associated with the prior e-mailmessages.
 58. The method of claim 57, wherein the blocks arevariable-sized blocks of the received e-mail message.
 59. In a networkof cooperating mail servers, one of the mail servers comprising: one ormore hash memories configured to store information relating to hashvalues corresponding to previously-observed e-mails; and a hashprocessor configured to: receive at least some of the hash values fromanother one or more of the cooperating mail servers, store informationrelating to the at least some of the hash values in at least one of theone or more hash memories, receive an e-mail message, hash one or moreportions of the received e-mail message to generate hash values, asgenerated hash values, determine whether the generated hash values matchthe hash values corresponding to previously-observed e-mails, andidentify the received e-mail message as a potentially unwanted e-mailmessage when one or more of the generated hash values associated withthe received e-mail message match one or more of the hash valuescorresponding to previously-observed e-mails.
 60. A mail server,comprising: one or more hash memories configured to store count valuesassociated with a plurality of hash values; and a hash processorconfigured to: receive e-mail messages, hash one or more portions of thereceived e-mail messages to generate hash values, as generated hashvalues, increment the count values corresponding to the generated hashvalues, as incremented count values, and generate suspicion scores forthe received e-mail messages based on the incremented count values. 61.The server of claim 60, wherein the hash processor is further configuredto: maintain a counter corresponding to each of the one or more hashmemories, and decrement ones of the count values based on the counter.62. The server of claim 61, wherein the hash processor is configured to:determine when a value of the counter reaches a threshold, and decrementone of the count values each time another one of the count values isincremented after the value of the counter reaches the threshold. 63.The server of claim 62, wherein the hash processor is further configuredto: identify a count value to decrement, determine whether theidentified count value is non-zero, and decrement the identified countvalue when the identified count value is non-zero.
 64. The server ofclaim 63, wherein the hash processor is further configured to: examinenext sequential ones of the count values until a non-zero count value isfound when the identified count value is zero, and decrement thenon-zero count value.
 65. A method for preventing transmission ofunwanted e-mail messages, comprising: receiving an e-mail message;generating a plurality of hash values, as generated hash values, overportions of the e-mail message as the e-mail message is being received;incrementally determining whether the generated hash values match hashvalues associated with prior e-mail messages; generating a suspicionscore for the e-mail message based on the incremental determining; andrejecting the e-mail message when the suspicion score of the e-mailmessage is above a threshold.
 66. The method of claim 65, wherein therejecting occurs before the e-mail message is completely received.